/**
* Buildclassifier selects a classifier from the set of classifiers
* by minimising error on the training data.
*
* @param data the training data to be used for generating the
* boosted classifier.
* @throws Exception if the classifier could not be built successfully
*/
public void buildClassifier(Instances data) throws Exception {
if (m_MetaClassifier == null) {
throw new IllegalArgumentException("No meta classifier has been set");
}
// can classifier handle the data?
getCapabilities().testWithFail(data);
// remove instances with missing class
Instances newData = new Instances(data);
m_BaseFormat = new Instances(data, 0);
newData.deleteWithMissingClass();
Random random = new Random(m_Seed);
newData.randomize(random);
if (newData.classAttribute().isNominal()) {
newData.stratify(m_NumFolds);
}
// Create meta level
generateMetaLevel(newData, random);
// Rebuilt all the base classifiers on the full training data
for (int i = 0; i < m_Classifiers.length; i++) {
getClassifier(i).buildClassifier(newData);
}
}
训练过程
然后是训练所有的基分类器,generateMetaLevel(//...)
protected void generateMetaLevel(Instances newData, Random random)
throws Exception {
Instances metaData = metaFormat(newData);
m_MetaFormat = new Instances(metaData, 0);
for (int j = 0; j < m_NumFolds; j++) {
Instances train = newData.trainCV(m_NumFolds, j, random);
// Build base classifiers
for (int i = 0; i < m_Classifiers.length; i++) {
getClassifier(i).buildClassifier(train);
}
// Classify test instances and add to meta data
Instances test = newData.testCV(m_NumFolds, j);
for (int i = 0; i < test.numInstances(); i++) {
metaData.add(metaInstance(test.instance(i)));
}
}
m_MetaClassifier.buildClassifier(metaData);
}
toc stacking
训练过程
产生模型
数据集转化
stacking
stacked 产生方法是一种截然不同的组合多个模型的方法,它讲的是组合学习器的概念,但是使用的相对于bagging和boosting较少,它不像bagging和boosting,而是组合不同的模型,具体的过程如下:Table of Contents
1.划分训练数据集为两个不相交的集合。
2. 在第一个集合上训练多个学习器。
3. 在第二个集合上测试这几个学习器
4. 把第三步得到的预测结果作为输入,把正确的回应作为输出,训练一个高层学习器,
这里需要注意的是1-3步的效果与cross-validation,我们不是用赢家通吃,而是使用非线性组合学习器的方法
下面是weka的stacking方法的buildClassifier方法:
/** * Buildclassifier selects a classifier from the set of classifiers * by minimising error on the training data. * * @param data the training data to be used for generating the * boosted classifier. * @throws Exception if the classifier could not be built successfully */ public void buildClassifier(Instances data) throws Exception { if (m_MetaClassifier == null) { throw new IllegalArgumentException("No meta classifier has been set"); } // can classifier handle the data? getCapabilities().testWithFail(data); // remove instances with missing class Instances newData = new Instances(data); m_BaseFormat = new Instances(data, 0); newData.deleteWithMissingClass(); Random random = new Random(m_Seed); newData.randomize(random); if (newData.classAttribute().isNominal()) { newData.stratify(m_NumFolds); } // Create meta level generateMetaLevel(newData, random); // Rebuilt all the base classifiers on the full training data for (int i = 0; i < m_Classifiers.length; i++) { getClassifier(i).buildClassifier(newData); } }训练过程
然后是训练所有的基分类器,generateMetaLevel(//...)protected void generateMetaLevel(Instances newData, Random random) throws Exception { Instances metaData = metaFormat(newData); m_MetaFormat = new Instances(metaData, 0); for (int j = 0; j < m_NumFolds; j++) { Instances train = newData.trainCV(m_NumFolds, j, random); // Build base classifiers for (int i = 0; i < m_Classifiers.length; i++) { getClassifier(i).buildClassifier(train); } // Classify test instances and add to meta data Instances test = newData.testCV(m_NumFolds, j); for (int i = 0; i < test.numInstances(); i++) { metaData.add(metaInstance(test.instance(i))); } } m_MetaClassifier.buildClassifier(metaData); }产生模型
上面的代码产生新的数据集,使用十折交叉来拆分数据集,形成训练和测试集,基分类器训练training数据集,在testing数据集上分类,再将结果合并到原始数据集,使用最终的m_MetaClassifier 训练。protected Instances metaFormat(Instances instances) throws Exception { FastVector attributes = new FastVector(); Instances metaFormat; for (int k = 0; k < m_Classifiers.length; k++) { Classifier classifier = (Classifier) getClassifier(k); String name = classifier.getClass().getName(); if (m_BaseFormat.classAttribute().isNumeric()) { attributes.addElement(new Attribute(name)); } else { for (int j = 0; j < m_BaseFormat.classAttribute().numValues(); j++) { attributes.addElement(new Attribute(name + ":" + m_BaseFormat.classAttribute().value(j))); } } } attributes.addElement(m_BaseFormat.classAttribute().copy()); metaFormat = new Instances("Meta format", attributes, 0); metaFormat.setClassIndex(metaFormat.numAttributes() - 1); return metaFormat; }数据集转化
上面的代码指出,Instances是0层的训练样本,为了迎合上面的训练过程,我们需要增加一些属性来保存测试结果,如果m_BaseFormat类别属性是连续值,那么就加入m_Classifiers个属性,如果是离散值,每次要加入0层类别属性取值个数个属性,最后加入到metaFormat的类别属性。下面是对单个实例的转化操作:
protected Instance metaInstance(Instance instance) throws Exception { double[] values = new double[m_MetaFormat.numAttributes()]; Instance metaInstance; int i = 0; for (int k = 0; k < m_Classifiers.length; k++) { Classifier classifier = getClassifier(k); if (m_BaseFormat.classAttribute().isNumeric()) { values[i++] = classifier.classifyInstance(instance); } else { double[] dist = classifier. distributionForInstance(instance); for (int j = 0; j < dist.length; j++) { values[i++] = dist[j]; } } } values[i] = instance.classValue(); metaInstance = new Instance(1, values); metaInstance.setDataset(m_MetaFormat); return metaInstance; }